Grammar-Based and Lexicon-Based Techniques to Extract Personality Traits from Text
نویسندگان
چکیده
Language provides an important source of information to predict human personality. However, most studies that have predicted personality traits using computational linguistic methods have focused on lexicon-based information. We investigate to what extent the performance of lexicon-based and grammarbased methods compare when predicting personality traits. We analyzed a corpus of student essays and their personality traits using two lexicon-based approaches, one top-down (Linguistic Inquiry and Word Count (LIWC)), one bottom-up (topic models) and one grammar-driven approach (Biber model), as well as combinations of these models. Results showed that the performance of the models and their combinations demonstrated similar performance, showing that lexicon-based topdown models and bottom-up models do not differ, and neither do lexicon-based models and grammar-based models. Moreover, combination of models did not improve performance. These findings suggest that predicting personality traits from text remains difficult, but that the performance from lexiconbased and grammar-based models are on par.
منابع مشابه
A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملIntegrating Probabilistic and Knowledge-based Approaches to Corpus Parsing
We have developed a prototype system for syntactic parsing of corpus text based on a wide-coverage unification-based grammar of English and domain-independent statistical techniques for selecting the most plausible parses from the typically large number licensed by the grammar. Although the results from initial experiments are promising, the system is ‘brittle’, relying particularly on the corr...
متن کاملImam Sadegh’s (AS) Hadiths in Sunni’s lexicon
The Quran and Hadiths including Infallibles (AS) Hadiths such as Imam Sadegh (AS) were one of compilation references, and also, one of the fields of research for Arabs morphologists from long time ago. Imam Sadegh’s (AS) Hadiths based on Sunni’s lexicon, and then, based on another Islamic science books will be illustrated in this research in order to identify where these Hadiths hav...
متن کاملLexicon Acquisition with and for Symbolic NLP-Systems – a Bootstrapping Approach
We present a method of applying a broad-coverage LFG grammar of German in the process of semi-automatic lexicon acquisition from corpora. The identification of corpus instances that illustrate a certain subcategorization frame uniquely is done by a comparison of the numbers of analyses the grammar assigns to the corpus instances, under the assumption of different hypothetical lexicon entries fo...
متن کاملThe Role of Textual vs. Compound Input Enhancement in Developing Grammar Ability
The present study investigated comparatively the impact of two types of input enhancement (i.e. textual vs. compound enhancement) on developing grammar ability in Iranian EFL setting. Sixty-five female secondary high school students were selected as a homogenous sample out of about a 100-member population based on Nelson language proficiency test. Then, their grammar ability was measured based ...
متن کامل